The analysis below applies the SOGP framework to investigate Covid-19 excess deaths. To do so we utilize the recently published short-term mortality fluctuation (STMF) dataset published by the Human Mortality Database. The STMF was expressly created to accomodate the critical need of short-term mortality analyses in response to the Covid-19 pandamic. The STMF contains detailed country-specific mortality, segregated by week (52 weeks per year), sex, and 3 age groups in 25 countries:

Austria (AUT)
Belgium (BEL)
Bulgaria (BGR)
Switzerland (CHE)
Czech Republic (CZE)
Germany (DEUTNP)
Denmark (DNK)
Spain (ESP)
Estonia (EST)
Finland (FIN)
France (FRATNP)
England and Wales (GBRTENW)
Scotland (GBR_SCO)
Hungary (HUN)
Iceland (ISL)
Israel (ISL)
Italy (ITA)
Lithuania (LTU)
Luxembourg (LUX)
Netherlands (NLD)
Norway (NOR)
Portugal (PRT)
Slovakia (SVK)
Sweden (SWE)
United States (USA)

The dataset is frequently updated. The analysis below is based on datasets last updated on July 7, 2020:

deaths <- readr::read_csv("https://www.mortality.org/Public/STMF/Outputs/stmf.csv", skip=1)

For our analysis we work with mortality rates (provided in STMF as the RTotal field) transformed to the log scale. We furthermore apply a few adjustments:

deaths <- data.table(deaths)
deaths <- deaths[Sex=="b",.(Year,Week,RTotal,DTotal,CountryCode)] 
deaths <- deaths[,y:=log(RTotal)]

\(\color{#c64329}{\textbf{Gaussian Process Regression in estimating 2020 excess deaths}}\)

Our primary goal is to estimate excess mortality during 2020. This is defined as the difference between the observed and the expected number of deaths. Epidemiologists use excess mortality as a standard tool to understand the full impact of the pandemic. It does not only reflect deaths dirctly due to Covid-19 but also other causes associated with the pandemic, such as delayed or forgone care due to hospitals prioritizing Covid-19 emergencies or patients themselves staying away.

To define excess mortality requires agreeing on the baseline, i.e. the expected number of deaths if Covid-19 did not occur. A very simple method is to look at historical weekly mortality and then average. For example, one common metric is the median mortality over the previous 5 years, 2015-2019. However, this approach ignores trends in the data (specifically the mortality improvement factors whereby mortality gets better YoY), is sensitive to data outliers, and most importantly does not provide good uncertainty quantification. UQ is critical to answer the question of whether observed death counts are statistically significant or can be attributed to “random” fluctuations. Mortality data is intrinsically volatile and it is not straightforward to distinguish between frequent fluctuations that could happen versus an unambiguous Covid-19 spike.

Our analysis uses statistical learning, namely GP models to answer the above. To do so, we build a GP model on the historical data, taking for our training set Years 2010-2019, as well as Weeks 1-5 of 2020. We then use this training set to make probabilistic forecasts for Weeks 6-present of 2020 and compare this forecast to the observed mortality. The vertical blue line in the figure below delineates our training set from the forecast period.

The Figure below displays the raw weekly log-mortality rates (after adjustments listed above) for each country indexed by weeks of the year. The 2020 observed log-mortality series is highlighted in red.

To apply the Gaussian process modeling, we treat data as a 2-dimensional table, indexed by Weeks (1-52, coordinate 1) and Years (2010-2020, coordinate 2):

Our goal is to use a Gaussian Process model to:

  1. smooth log-mortality rates in Year and Week dimensions;

  2. perform out-of-sample predictions in 2020 starting from Week 5 to Week 22 (end of May 2020).

Note that GP is effectively a spatial model and does not have any time-series concepts embedded; it treats retrospective and future forecasts identically.

\(\color{#c64329}{\textbf{Sample Gaussian Process Regression for Sweden all-Age Mortality}}\) Below is an illustration on the application of Gaussian Process to analyze and forecast time-series data in Sweden.

 

\(\color{#c64329}{\textbf{Excess mortality}}\)

We can compare the GP-based mean forecast for 2020 against the observed weekly log-mortality rates to identify excess mortality. In the Figure below, the shaded region is the excess mortality starting at Week 11 (early March). Some countries have experienced much more severe impact than others. Observe that 2020 data ends at different weeks in different nations. We also note that in most countries there is a strong seasonal effect whereby mortality is expected to decrease as we enter the Summer months.

 

\(\color{#c64329}{\textbf{Credible Bands}}\)

To highlight the uncertainty quantification available with a GP model, we plot below the credible intervals around our GP-based 2020 forecasts. Below, we plot the mean forecasts for each country along with the 80% and 95% credible bands. We also show the 5-year medians and the observed 2020 rate for comparison purpose.

 

 

 

We note that in several countries, there is no statistically significant deviation from expected mortality. In other cases, such as Spain and USA, the Covid-19 spikes are clear.

A few remarks can be made relative to the historical 5-year median:

\(\color{#c64329}{\textbf{Excess deaths}}\)

Country Total Excess Deaths in 2020 Last Week Reported
Austria 217 25
Belgium 7497 25
Bulgaria -2154 26
Switzerland 683 24
Czech Republic -389 20
Germany -696 23
Denmark 114 26
Spain 39942 23
Estonia -26 26
Finland 113 24
France 17962 21
England & Wales 52088 26
Scotland 4433 26
Hungary -1453 23
Israel -192 21
Italy 46107 19
Lithuania -184 23
Netherlands 7706 25
Norway 84 25
Portugal 1124 25
Slovakia -388 17
Sweden 5452 26
USA 119799 23

\(\color{#c64329}{\textbf{GP Diagnostics}}\)

For completeness, we report below the fitted GP mean functions for each country, as well as the GP hyperparameters. We note that our basic model had difficulty finding appropriate lengthscales for E&W and was fitted manually for now.

Intercept Year trend A_1 A_2 A_3 A_4
AUT -6.0061 0.0007 0.0785 0.0531 0.0036 0.0306
BEL -1.5111 -0.0016 0.0917 0.0720 0.0095 0.0261
BGR -17.9613 0.0068 0.0936 0.0650 0.0275 0.0153
CHE -1.1434 -0.0018 0.0879 0.0456 0.0005 0.0293
CZE -12.1034 0.0037 0.0643 0.0478 -0.0024 0.0274
DEUTNP -16.2533 0.0058 0.0787 0.0645 -0.0067 0.0334
DNK 3.3360 -0.0040 0.0677 0.0569 0.0047 0.0194
ESP -26.6084 0.0108 0.1048 0.0805 0.0365 0.0419
EST -5.3941 0.0005 0.0758 0.0575 -0.0047 0.0245
FIN -13.8670 0.0046 0.0636 0.0436 -0.0009 0.0116
FRATNP -22.2320 0.0087 0.0988 0.0526 0.0129 0.0233
GBRTENW -13.5626 0.0044 0.1077 0.0152 0.0161 -0.0019
GBR_SCO -20.8501 0.0081 0.0913 -0.0189 0.0243 -0.0157
HUN -12.4153 0.0040 0.0845 0.0573 -0.0009 0.0284
ISR 2.3088 -0.0038 0.1123 0.0675 0.0131 0.0383
ITA -0.7686 -0.0019 0.1016 0.0635 0.0394 0.0476
LTU -9.5483 0.0026 0.0669 0.0626 0.0036 0.0112
NLD -23.0370 0.0091 0.0754 0.0570 0.0063 0.0169
NOR 20.7305 -0.0127 0.0858 0.0435 0.0117 0.0193
PRT -28.5257 0.0119 0.1410 0.0910 0.0403 0.0504
SVK -7.9539 0.0016 0.0605 0.0457 0.0012 0.0286
SWE 23.2619 -0.0139 0.0887 0.0569 0.0004 0.0229
USA -25.8126 0.0104 0.0721 0.0368 0.0148 0.0064

Hyper-parameters in the covariance function for each country:

Year Lengthscale Week Lengthscale Process variance Observ. Noise
AUT 0.0331 3.875 0.0019 0.0015
BEL 0.1210 1.978 0.0021 0.0012
BGR 0.0931 1.662 0.0027 0.0011
CHE 0.1140 3.307 0.0016 0.0011
CZE 0.0018 3.197 0.0018 0.0013
DEUTNP 0.0520 1.858 0.0023 0.0005
DNK 0.0777 2.856 0.0012 0.0011
ESP 0.1156 1.903 0.0023 0.0004
EST 0.0287 2.276 0.0014 0.0041
FIN 0.3633 1.705 0.0013 0.0011
FRATNP 0.1216 1.967 0.0016 0.0004
GBRTENW 0.1673 3.451 0.0031 0.0036
GBR_SCO 0.7294 19.434 0.0073 0.0032
HUN 0.0463 3.015 0.0023 0.0019
ISR 0.1254 2.540 0.0010 0.0016
ITA 0.3904 2.332 0.0030 0.0008
LTU 0.5754 1.809 0.0023 0.0020
NLD 0.0215 3.509 0.0015 0.0007
NOR 0.0558 4.550 0.0008 0.0017
PRT 0.1198 1.814 0.0030 0.0014
SVK 0.0000 2.824 0.0021 0.0019
SWE 0.1224 2.567 0.0011 0.0009
USA 0.3726 4.950 0.0009 0.0001